Generating Robust Features for Style-independent Labeling of Bibliographic Fields in Medical Journal Articles
نویسندگان
چکیده
Bibliographical data such as title, author, affiliation, and abstract are crucial for indexing biomedical journal articles. The Medical Article Records System (MARS) has been developed at the National Library of Medicine (NLM) to automate bibliographical data extraction for MEDLINE®, the NLM’s premier database of citations to the biomedical literature. The automatic extraction of bibliographic data involves the process of assigning logical labels (title, author, affiliation, and abstract) to homogeneous regions or zones on page images. While an OCRand rule-based labeling module (called ZoneCzar) in MARS can reliably label medical journals with regular layout styles, it cannot accurately label the journals with arbitrary or unusual layout styles, and new rules have to be manually created for these journals. Furthermore, the OCR zoning errors, particularly merging errors, can greatly affect the labeling accuracy of ZoneCzar. In this paper, we describe an algorithm for automatic generation of robust features that are used by the labeling algorithm to perform style-independent labeling.
منابع مشابه
Style-independent document labeling: design and performance evaluation
The Medical Article Records System or MARS has been developed at the U.S. National Library of Medicine (NLM) for automated data entry of bibliographical information from medical journals into MEDLINE®, the premier bibliographic citation database at NLM. Currently, a rule-based algorithm (called ZoneCzar) is used for labeling important bibliographical fields (title, author, affiliation, and abst...
متن کاملRadiology, nuclear medicine, and medical imaging: a bibliometric study in Iran
Introduction: Nowadays, science mapping is considered an excellent technique for decision-makers to find solutions for problems in research planning and development. In this work, we aimed to depict a science map of “radiology, nuclear medicine, and medical imaging” in Iran. Methods: All publications indexed in Thomson Reuters Web of Science database in the fields mentioned above with at least...
متن کاملAutomated Labeling from Biomedical Journals published in Foreign Languages
An automated labeling (AL) module is developed to produce bibliographic records such as English title, vernacular title, author, affiliation, and English abstract from biomedical articles published in foreign language journals. Optical character recognition (OCR) output from scanned biomedical journals is used in this labeling process. Since frequently occurring words in a zone are important fe...
متن کاملAutomated Labeling Algorithms for Biomedical Document Images
The National Library of Medicine (NLM) has developed an automated system, named Medical Article Records System (MARS), to process bibliographic data (title, authors, affiliation, abstract, etc.) in biomedical journal articles for its MEDLINE database. This paper describes a labeling module in the MARS, which automatically extract the bibliographic data in biomedical journal articles. The label...
متن کاملتحلیل استنادی مقالات مجله دانشگاه علوم پزشکی قم بین سالهای 1391-1386
Background and Objectives:Regarding the important role and position of journals in presentation of the most up-to-date scholarly information, the scientific and meticulous evaluation of such sources is very essential. Te present research was carried out to determine the citation status in the articles of the Journal of Qom University of Medical Sciences. Methods:This research, as a descriptive...
متن کامل